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1 Introduction 

The process of gravitational instability initiated by small primordial den- 
sity perturbations is a vital ingredient of cosmological models that attempt 
to explain how galaxies and large-scale structure formed in the Universe. 
In the standard picture (the "concordance" model), a period of accelerated 
expansion ("inflation") generated density fluctuations with simple statisti- 
cal properties through quantum processes (Starobinsky 1979, 1980, 1982; 
Guth 1981; Guth & Pi 1982; Albrecht & Steinhardt 1982; Linde 1982). In 
this scenario the primordial density field is assumed to form a statistically 
homogenous and isotropic Gaussian Random Field (GRF). Over years of ob- 
servational scrutiny this paradigm has strengthened its hold in the minds of 
cosmologists and has survived many tests, culminating in those furnished by 
the Wilkinson Microwave Anisotropy Probe (WMAP; Bennett et al. 2003; 
Hinshaw et al 2003). 

Gaussian random fields are the simplest fully-defined stochastic pro- 
cesses (Adler 1981; Bardeen et al. 1986), which makes their analysis rel- 
atively straightforward. Robust and powerful statistical descriptors can be 
constructed that have a firm mathematical underpinning and are relatively 
simple to implement. Second-order statistics such as the ubiquitous power- 
spectrum (e.g. Peacock & Dodds 1996) furnish a complete description of 
Gaussian fields. They have consequently yielded invaluable insights into the 
behaviour of large-scale structure in the latest generation of redshift surveys, 
such as the 2dFGRS (Percival et al. 2001). Important though these methods 
undoubtedly are, the era of precision cosmology we are now entering requires 
more thought to be given to methods for both detecting and exploiting de- 
partures from Gaussian behaviour. 

Even if the primordial density fiuctuations were indeed Gaussian, the later 
stages of gravitational clustering must induce some form of non-linearity. One 
particular way of looking at this issue is to study the behaviour of Fourier 
modes of the cosmological density field. If the hypothesis of primordial Gaus- 
sianity is correct then these modes began with random spatial phases. In the 
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early stages of evolution, the plane- wave components of the density evolve in- 
dependently like linear waves on the surface of deep water. As the structures 
grow in mass, they interact with other in non-linear ways, more like waves 
breaking in shallow water. These mode-mode interactions lead to the gener- 
ation of coupled phases. While the Fourier phases of a Gaussian field con- 
tain no information (they are random), non-linearity generates non-random 
phases that contain much information about the spatial pattern of the fluc- 
tuations. Although the significance of phase information in cosmology is still 
not fully understood, there have been a number of attempts to gain quanti- 
tative insight into the behaviour of phases in gravitational systems. Ryden 
& Gramann (1991), Soda & Suto (1992) and Jain & Bcrtschingcr (1998) 
concentrated on the evolution of phase shifts for individual modes using per- 
turbation theory and numerical simulations. An alternative approach was 
adopted by Schcrrcr, Mclott & Shandarin (1991), who developed a practi- 
cal method for measuring the phase coupling in random fields that could be 
applied to real data. Most recently Chiang & Coles (2000), Coles & Chiang 
(2000), Chiang (2001) and Chiang, Naselsky & Coles (2002) have explored 
the evolution of phase information in some detail. 

Despite this recent progress, there is still no clear understanding of how 
the behaviour of the Fourier phases manifests itself in more orthodox statis- 
tical descriptors. In particular there is much interest in the usefulness of the 
simplest possible generalisation of the (second-order) power-spectrum, i.e. 
the (third-order) bispcctrum (Peebles 1980; Scoccimarro ct al. 1998; Scoc- 
cimarro, Couchman & Frieman 1999; Verde et al. 2000; Verde et al. 2001; 
Verde et al. 2002). Since the bispectrum is identically zero for a Gaussian 
random field, it is generally accepted that the bispectrum encodes some form 
of phase information but it has never been elucidated exactly what form of 
correlation it measures. Further possible generalisations of the bispectrum are 
usually called polyspectra; they include the (fourth-order) trispcctrum (Verde 
& Heavens 2001) or a related but simpler statistic called the second-spectrum 
(Stirling & Peacock 1996). Exploring the connection between polyspectra and 
non-lincarly induced phase association is one of the aims of this paper. 

Gravitational instability is expected to generate phase correlations (and 
non-Gaussianity) even if the primordial fluctuations were Gaussian. The Cos- 
mic Microwave Background (CMB) allows us to probe the fluctuations while 
they are still in the linear regime and thus test the level of primordial non- 
Gaussianity without having to worry about non-linear effects. A second aim 
of this paper is to explain how one can use phase correlations in spherical har- 
monic expansions of temperature fluctuations in order to detect departures 
from standard fluctuation statistics. 

Finally I discuss the use of topological invariants such as the Euler- 
Poincare characteristic of isodensity contours to assess the level of non- 
Gaussianity in large-scale structure. 
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2 Basic Statistical Tools 

I start by giving some general definitions of concepts which I will later use in 
relation to the particular case of cosmological density fields. In order to put 

our results in a clear context, I develop the basic statistical description of 
cosmological density fields; see also, e.g., Peebles (1980) and Coles & Lucchin 
(2002). 

2.1 Fourier Description 

I follow standard practice and consider a region of the Universe having vol- 
ume Vu, for convenience assumed to be a cube of side L ^ Ig, where Is is 
the maximum scale at which there is significant structure due to the pertur- 
bations. The region Vu can be thought of as a "fair sample" of the Universe 
if this is the case. It is possible to construct, formally, a "realisation" of the 
Universe by dividing it into cells of volume Vu with periodic boundary con- 
ditions at the faces of each cube. This device is often convenient, but in any 
case one often takes the limit Vu ^ oo. Let us denote by p the mean density 
in a volume Vu and take p(x) to be the density at a point in this region 
specified by the position vector x with respect to some arbitrary origin. As 
usual, the fluctuation is defined to be 

^(x) = [p(x)-p]/p. (1) 

We assume this to be expressible as a Fourier series: 

^(x) = ^ (5k exp(ik-x)=^ exp(— ik-x); (2) 

k k 

the appropriate inverse relationship is of the form 

5k = J— 1 S{x) exp(— ik • x)dx. (3) 

Jvu 

The Fourier coefficients Sk are complex quantities, 

^k = l^kl exp {i(f)u.), (4) 

with amplitude |^k| and phase (/)k. The assumption of periodic boundaries 

results in a discrete k-space representation; the sum is taken from the Nyquist 
frequency /cNy = 2tt/L, where Vu = L^, to infinity. Note that as L ^ oo, 
^Ny — * 0. Conservation of mass in Vu implies ^k=o = and the reality of ^(x) 
requires = (5_k- 

If, instead of the volume Vu, we had chosen a different volume Vu the 
perturbation within the new volume would again be represented by a series of 

the form (2), but with different coefficients (5k. Now consider a (large) number 
A'' of realisations of our periodic volume and label these realisations by Vui, 
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Vu2, Vu3j •••) VuN- It is meaningful to consider the probability distribution 
of the relevant coefficients Sy^ from realisation to realisation across 
this ensemble. One typically assumes that the distribution is statistically 
homogeneous and isotropic, in order to satisfy the Cosmological Principle, 
and that the real and imaginary parts of 6k have a Gaussian distribution and 
are mutually independent, so that 

where w stands for cither Re [S^^] or Im [5]^] and = ct^ is the spec- 

trum. This is the same as the assumption that the phases 0k in equation 
(5) are mutually independent and randomly distributed over the interval be- 
tween = and = 27r. In this case the moduli of the Fourier amplitudes 
have a Rayleigh distribution: 



?'(|<5k|,0k)rf|5k|#k = 




Because of the assumption of statistical homogeneity and isotropy, the quan- 
tity P{Sw) depends only on the modulus of the wavevector k and not on its 
direction. It is fairly simple to show that, if the Fourier quantities |(5k| have 
the Rayleigh distribution, then the probability distribution P{6) of 6 = 6{x) 
in real space is Gaussian, so that: 

where cr^ is the variance of the density field <5(x). This is a strict definition 
of Gaussianity. However, Gaussian statistics do not always require the dis- 
tribution (7) for the Fourier component amplitudes. According to its Fourier 
expansion, J(x) is simply a sum over a large number of Fourier modes whose 
amplitudes are drawn from some distribution. If the phases of each of these 
modes are random, then the Central Limit Theorem will guarantee that the 
resulting superposition will be close to a Gaussian if the number of modes is 
large and the distribution of amplitudes has finite variance. Such fields are 
called weakly Gaussian. 



2.2 Covariance Functions &: Probability Densities 

I now discuss the real-space statistical properties of spatial perturbations in 
p. The covariance function is defined in terms of the density fluctuation by 

^^^^ ^ ([p(x)-p-]Kx + r)-p-]) ^ ^ 
P 
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The angle brackets in this expression indicate two levels of averaging: first 
a volume average over a representative patch of the universe and second an 
average over different patches within the ensemble, in the manner of §2.1. 
Applying the Fourier machinery to equation (8) one arrives at the Wiener- 
Khintchin theorem, relating the covariance to the spectral density function 
or power spectrum, P[k): 

C(r)=5^(|5k|')expHk.r), (9) 

k 

which, in passing to the limit Vu ^ oo, becomes 

^^'■^ ^Wfl ^^^^ ^"^^''^ ■ 

Averaging equation (9) over r gives 

(^W)r = ^ E(l^kP) I cxp(-ik . r)dr = 0. (11) 

k 

The function ^(r) is the two-point covariance function. In an analogous 
manner it is possible to define spatial covariance functions for iV > 2 points. 
For example, the three-point covariance function is 

^ (bW - p][p(x + r) - p][p{x + s) - p]} ^^2) 

which gives 

C(r,s) =((5(x)(5(x + r)(5(x + s)), (13) 

where the spatial average is taken over all the points x and over all directions 
of r and s such that |r — s| = t: in other words, over all points defining a 
triangle with sides r, s and t. The generalisation of (12) to > 3 is obvious. 

The covariance functions arc related to the moments of the probability 
distributions of (5(x). If the fluctuations form a Gaussian random field then 
the N-variate distributions of the set 6i = d{xi) are just multivariate Gaus- 
sians of the form 

The correlation matrix can be expressed in terms of the covariance func- 
tion 

Cij = {SiSj)=C{rij). (15) 

It is convenient to go a stage further and define the N-point connected co- 
variance functions as the part of the average (5i...^jv) that is not expressible 
in terms of lower order functions e.g. 
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{S162S3} = {Sl}c{S2S3}c + {S2}c{Sl63)c + {63}c{Sl62}c + {Sl}c{S2}c{S3)c+{Sl6263)c, 

(16) 

where the connected parts are {616263)0, {6162)0, etc. Since {6) = by con- 
struction, {6i)c = {61) = 0. Moreover, {6162)0 = {6162) and {616263)0 = 
{616263). The second and tliird order connected parts are simply the same 
as the covariance functions. Fourth and higher order quantities are different, 
however. The connected functions are just the multivariate generahsation of 
the cumulants kn (KcndaU & Stewart 1977). One of the most important 
properties of Gaussian fields is that all of their N-point connected covari- 
ances are zero beyond N=2, so that their statistical properties are fixed once 
the set of two point covariances (15) is determined. All large-scale statistical 
properties are therefore determined by the asymptotic behaviour of ^(r) as 
r — > 00. 



3 Phase Coupling 

In §2 we pointed out that a convenient definition of a Gaussian field could 
be made in terms of its Fourier phases, which should by independent and 
uniformly distributed on the interval [0, 2tt] . A breakdown of these conditions, 
such as the correlation of phases of difi^erent wavemodes, is a signature that 
the field has become non-Gaussian. In terms of cosmic large-scale structure 
formation, non-Gaussian evolution of the density field is symptomatic of the 
onset of non-linearity in the gravitational collapse process, suggesting that 
phase evolution and non-linear 

evolution are closely linked. A relatively simple picture emerges for models 
where the primordial density fiuctuations are Gaussian and the initial phase 
distribution is uniform. When perturbations remain small evolution proceeds 
linearly, individual modes grow independently and the original random phase 
distribution is preserved. However, as perturbations grow large their evolu- 
tion becomes non-linear and Fourier modes of diff'erent wavenumber begin to 
couple together. This gives rise to phase association and consequently to non- 
Gaussianity. It is clear that phase associations of this type should be related 
in some way to the existence of the higher order connected covariance func- 
tions, which are traditionally associated with non-linearity and are non-zero 
only for non-Gaussian fields. In this sections such a relationship is explored 
in detail using an analytical model for the non-linearly evolving density fluc- 
tuation field. Phase correlations of a particular form are identified and their 
connection to the covariance functions is established. 

A graphic demonstration of the importance of phases in patterns generally 
is given in Figure 1. Since the amplitude of each Fourier mode is unchanged 
in the phase reshuffling operation, these two pictures have exactly the same 
power-spectrum, P{k) oc |(5(k)p. In fact, they have more than that: they 
have exactly the same amplitudes for all k. They also have totally different 
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Fig. 1. Numerical simulation of galaxy clustering (left) together with a version 
generated randomly reshuffling the phases between Fourier modes of the original 
picture (right). 

morphology. Further demonstrations of the importance of Fourier phases in 
defining clustering morphology are given by Chiang (2001). 

3.1 Quadratic density fields 

It is useful at this stage to a particular form of non-Gaussian field that serves 
both as a kind of phenomenological paradigm and as a reasonably realistic 
model of non- linear evolution from Gaussian initial conditions. The model 
involves a field which is generated by a simple quadratic transformation of 
a Gaussian distribution, hence the term quadratic non-linearity. Quadratic 
fields have been discussed before from a number of contexts (e.g. Coles & 
Barrow 1987; Moscardini et al. 1991; Falk, Rangarajan & Srednicki 1993; Luo 
& Schramm 1993; Luo 1994; Gangui et al. 1994; Koyoma, Soda & Taruya 
1999; Peebles 1999a,b; Matarrese, Verde & Jimenez 2000; Verde et al. 2000; 
Verde et al. 2001; Komatsu & Spergel 2001; Shandarin 2002; Bartolo, Matar- 
rese & Riotto 2002); for further discussion see below. The motivation is very 
similar to that of Coles & Jones (1991), which introduced the lognormal den- 
sity field as an illustration of some of the consequences of a more extreme 
form of non-linearity involving an exponential transformation of the linear 
density field. 

3.2 A simple non-linear model 

We adopt a simple perturbative expansion of the form 



<5(x) = ,5i(x) + e52(x) 



(17) 
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to mimic the non-linear evolution of the density field. Although the equiv- 
alent transformation in formal Eulcrian perturbation theory is a good deal 
more complicated, the kind of phase associations that we will deal with here 
are precisely the same in either case. In terms of the Fourier modes, in the 
continuum limit, we have for the first order Gaussian term 

Si{x) = J d'^fc |(5k;| exp [z(^k] exp [ik • x] (18) 

and for the second-order perturbation 

52(x) = [<5i(x)]' = J d^kSk' |5k||(5k'|exp[z(0k + 0k')] exp[z(k + k') -r]. 

(19) 

The quadratic field, 82, illustrates the idea of mode coupling associated with 
non-linear evolution. The non-linear field depends on a specific harmonic 
relationship between the wavenumber and phase of the modes at k and k'. 
This relationship between the phases in the non-linear field, i.e. 

<?f>k + 4>\^' = <f>k+i<.', (20) 

where the RHS represents the phase of the non-linear field, is termed 
quadratic phase coupling. 



3.3 The two-point covariance function 

The two-point covariance function can be calculated using the definitions of 
§2, namely 

^(r) = (5(x)^(x + r)). (21) 

Substituting the non- linear transform for 5(x) (equation 17) into this expres- 
sion gives four terms 

^(r) = (^i(x)5i(x+r))+e(5i(x)52(x+r))+e(52(x)5i(x+r))+e2(52(x)52(x+r)). 

(22) 

The first of these terms is the linear contribution to the covariance function 
whereas the remaining three give the non-linear corrections. We shall focus 
on the lowest order term for now. 

As we outlined in Section 2, the angle brackets () in these expressions are 
expectation values, formally denoting an average over the probability distri- 
bution of S{x). Under the fair sample hypothesis we replace the expectation 
values in equation (21) with averages over a selection of independent volumes 
so that OvqI j-gaj. The first average is simply a volume integral over a 
sufficiently large patch of the universe. The second average is over various 
realisations of the 5i~ and in the different patches. Applying these rules to 
the first term of equation (22) and performing the volume integration gives 
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eii(r)= j d^kd^k' (|5k||<5k'|exp[i(0k + </'k')]>real^i?(k + k')exp[ik'-s], 

(23) 

whore Sd is the Dirac delta function. The above expression is simplified given 
the reality condition 

<5k - (24) 
from which it is evident that the phases obey 

</>k + ^-k = mod[27r]. (25) 

Integrating equation (23) one therefore finds that 

Cn(r) - J d^'k (|(5kP)realCxpHk-s]. (26) 

so that the final result is independent of the phases. Indeed this is just the 
Fourier transform relation between the two-point covariance function and the 
power spectrum we derived in §2.1. 

3.4 The three-point covariance function 

Using the same arguments outlined above it is possible to calculate the 3- 
point connected covariance function, which is defined as 

C(r, s) = (<5(x)5(x + r)d{K + s)),. (27) 

Making the non-linear transform of equation (17) one finds the following 
contributions 

C(r, s) = ((5i(x)(5i (x + r)(5i(x + s))^ + e((5i(x)5i (x + r)52(x + s))e 
+perms(121, 211) + e^{Si{x.)S2ix + r)S2{x + s))^ 
+perms(212, 221) + €^{S2{x)S2{x + r)S{x + s))^. (28) 

Again we consider first the lowest order term. Expanding in terms of the 
Fourier modes and once again replacing averages as prescribed by the fair 
sample hypothesis gives 

Ciii(r,s)= J d^kd^k'd^k" (|(5k||(5k'||'5k"|cxp[i(0k + </'k' +<?!'k")])real 

X(5z3(k + k' + k") exp \ik' ■ r] exp [ik" • s]. (29) 

Recall that Si is a Gaussian field so that (j)^, (/)k' and (^k" arc independent 
and uniformly random on the interval [0, 27r]. Upon integration over one of 
the wavevectors the phase terms is modified so that its argument contains 
the sum ((/)k + (j)^' + (fi-k-k"), or a permutation thereof. Whereas the re- 
ality condition of equation (24) implies a relationship between phases of 
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anti-parallel wavevectors, no such conditions hold for modes linked by the 
triangular constraint imposed by the Dirac delta function. In other words, 
except for serendipity, 

0k + <f>k' + (t>-k-k" + 0. (30) 

In fact due to the circularity of phases, the resulting sum is still just uniformly 
random on the interval [0, 27r] if the phases are random. Upon averaging over 
sufficient realisations, the phase term will therefore cancel to zero so that the 
lowest order contribution to the 3-point function vanishes, i.e. Ciii(r, s) = 0. 
This is not a new result, but it does explicitly illustrate how the vanishing of 
the three-point connected covariance function arises in terms of the Fourier 
phases. 

Next consider the first non-linear contribution to the 3-point function 
given by 

Cii2(r, s) = e(^i(x)^i(x + r)^2(x + s)), (31) 

or one of its permutations. In this case one of the arguments in the average 
is the field (52 (x), which exhibits quadratic phase coupling of the form (20). 
Expanding this term to the point of equation (29) using the definition (19) 
one obtains 

Cii2(r,s) = J d^kd^k'd^k"d^k"' 

{\Sk\\6k'\\6k"\\6k"'\e-x.p [i{4)k. + 4>k' + 4>k." + </'k"')])real 
xfe(k + k' + k" + k"') 

X exp [ik' • r] exp [i(k" + k'") • s] . (32) 

Once again the Dirac delta function imposes a general constraint upon the 

configuration of wavevectors. Integrating over one of the k gives k'" = 
— k — k' — k" for example, so that the wavevectors must form a closed loop. 
This general constraint however, does not specify a precise shape of loop, 
instead the remaining integrals run over all of the different possibilities. At 
this point we may constrain the problem more tightly by noting that most 
combinations of the k will contribute zero to C(ii2)- This is because of the 
circularity property of the phases and equation (30). Indeed, the only nonzero 
contributions arise where we are able to apply the phase relation obtained 
from the reality constraint, equation (25). In other words the properties of the 
phases dictate that the wavevectors must align in anti-parallel pairs: k = — k', 
k" = -k'" and so forth. 

There is a final constraint that must be imposed upon the k if ^ is the 
connected 3-point covariance hmction. In a graph theoretic sense, the general 
(unconnected) A^-point function {di^ (xi)(5(2 (x2)...(5(jv (xat)) can be represented 
geometrically by a sum of tree diagrams. Each diagram consists of N nodes 
of order li, representing the (5;.(xj), and a number of linking lines denoting 
their correlations; see Fry (1984) or Bernardeau (1992) for more detailed 
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accounts. Every node is made up of k internal points, which represent a 
factor = I I exp (iipk) in the Fourier expansion. According to the rules for 
constructing diagrams, linking lines may join one internal point to a single 
other, either within the same node or in an external node. The connected 
covariancc functions arc represented specifically by the subset of diagrams 
for which every node is linked to at least one other, leaving none completely 
isolated. This constraint implies that certain pairings of wavevectors do not 
contribute to the connected covariance function. For more details, see Watts 
& Coles (2002). 

The above constraints may be inserted into equation (32) by re- writing the 

Dirac delta function as a product over Delta functions of two arguments, ap- 
propriately normalised. There are only two allowed combinations of wavevec- 
tors so we have 

5,5(k + k' + k" + k"')^;r^[(5i3(k + k")5D(k"' + k"')+(5D(k + k"')(5D(k' + k")]. 

(33) 

Integrating over two of the k and using equation (25) eliminates the phase 
terms and leaves the final result 

Cii2(r,s) = :^ J Skd^k' (|<5k|'|(5kf)reaiexp[ik'-r]expH(k + k')-s]. 

(34) 

The existence of this quantity has therefore been shown to depend on the 
quadratic phase coupling of Fourier modes. The relationship between modes 
and the interpretation of the tree diagrams is also dictated by the properties 
of the phases. 

One may apply the same rules to the higher order terms in equation (28). 

It is immediately clear that the C122 terms are zero because there is no way 
to eliminate the phase term exp [«(^k + 4>w + 0k" + 0k'" + 0k"")]) a conse- 
quence of the property equation (30). Diagrammatically this corresponds to 
an unpaired internal point within one of the nodes of the tree. The final, 
highest order contribution to the 3-point function is found to be 

C222(r,s) = ^ j d^kd^k'd^k" (|5kn^kf|^k"P)real 

X exp [z(k - k') • r] exp [i(k' - k") • s], (35) 

where the phase and geometric constraints allow 12 possible combinations of 
wavevectors. 

3.5 Power-spectrum and Bispectrum 

The formal development of the relationship between covariance functions 
and power-spectra developed above suggests the usefulness of higher-order 

versions of P{k). It is clear from the above arguments that a more convenient 
notation for the power-spectrum than that introduced in §2.1 is 
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{5M = (27r)3p(fc)(5,5(k + k'). (36) 

The connection between phases and higher-order covariance functions ob- 
tained above also suggests defining higher-order polyspectra of the form 

{S^Sk' . . . 4(n)) = (27r)3p„(k, k', . . . k("))5o(k + k' + . . . k(")) (37) 

where the occurrence of the delta-function in this expression arises from a 
generalisation of the reality constraint given in equation (25); see, e.g., Pee- 
bles (1980). Conventionally the version of this with n = 3 produces the 
bispectrum, usually called _B(k, k',k") which has found much effective use 
in recent studies of large-scale structure (Peebles 1980; Scoccimarro et al. 
1998; Scoccimarro, Couchman & Frieman 1999; Verde et al. 2000; Verde et 
al. 2001; Verde et al. 2002). It is straightforward to show that the bispectrum 
is the Fourier-transform of the (reduced) three-point covariance function by 
following similar arguments; see, e.g., Peebles (1980). 

Note that the delta-function constraint requires the bispectrum to be 
zero except for fc-vcctors (k, k', k") that form a triangle in fc-space. It is 
clear that the bispectrum can only be non-zero when there is a definite re- 
lationship between the phases accompanying the modes whose wave-vectors 
form a triangle. Moreover the pattern of phase association necessary to pro- 
duce a real and non-zero bispcctrmn is precisely that which is generated by 
quadratic phase association. This shows, in terms of phases, why it is that 
the leading order contributions to the bispectrum emerge from second-order 
fluctuations of a Gaussian random field. The bispectrum measures quadratic 
phase coupling. 

Three-point phase correlations have another interesting property. While 
the bispectrum is usually taken to be an ensemble-averaged quantity, as de- 
fined in equation (37), it is interesting to consider products of terms (5k'5k'^k" 
obtained from an individual realisation. According to the fair sample hypoth- 
esis discussed above we would hope appropriate averages of such quantities 
would yield an estimate of the bispectrum. Note that 

(5k(5k'5k" = ^k<5k"^-k-k' = <5k(5k"5k+k' = /5(k, k'), (38) 

using the requirement (25), together with the triangular constraint we dis- 
cussed above. Each /3(k, k') will carry its own phase, say (/>k,k', which obeys 

(pk,]!.' =(!)]<. + (t>k' - (t^k+k' ■ (39) 

It is evident from this that it is possible to recover the complete set of phases 

(/)k from the bispectral phases (/)k.k' , up to a constant phase offset correspond- 
ing to a global translation of the entire structure (Chiang & Coles 2000). This 
furnishes a conceptually simple method of recovering missing or contaminated 
phase information in a consistent way, an idea which has been exploited, for 
example, in speckle interferometry (Lohmann, Weigelt & Wirnitzer 1983). In 
the case of quadratic phase coupling, described by equation (20), the left- 
hand-side of equation (39) is identically zero leading to a particularly simple 
approach to this problem. 
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4 Phase Correlations in the CMB 

Since the release of the first (prehminary) WMAP data set it has been sub- 
jected to a number of detailed independent analyses that have revealed some 
surprising features. Erikscn ct al. (2004) have pointed out the existence of a 
North-South asymmetry suggesting that the cosmic microwave background 
(CMB) revealed by the WMAP data is not statistically homogeneous over the 
celestial sphere. This is consistent with the results of Coles et al. (2004) who 
found evidence for phase correlations in the WMAP data; see also Hajian & 
Souradeep (2003) and Hajian, Souradeep & Cornish (2004). The low-order 
multipoles of the CMB also display some peculiarities (dc Oliveira-Costact al. 
2004a; Efstathiou 2004). Vielva et al. (2004) found significant non-Gaussian 
behaviour in a wavelet analysis of the same data, as did Chiang et al. (2004), 
Larson & Wandelt (2004) and Park (2004). Other analyses of the statistical 
properties of the WMAP have yielded results consistent with the standard 
form of fluctuation statistics (Komatsu et al. 2003; Colley & Gott 2003). 
These iimisual properties may well be generated by residual foreground con- 
tamination (Banday et al. 2003; Naselsky et al. 2003; de Oliveira-Costa et al. 
2004; Dineen & Coles 2004) or other systematic effects, but may also provide 
the first hints of physics beyond the standard cosmological model. 

In order to tap the rich source of information provided by future CMB 
maps it is important to devise as many independent statistical methods as 
possible to detect, isolate and diagnose the various possible causes of de- 
partures from standard statistics. One particularly fruitful approach is to 
look at the behaviour of the complex coefficients that arise in a spherical 
harmonic analysis of CMB maps. Chiang et al. (2004), Chiang, Naselsky & 
Coles (2004), and Coles et al. (2004) have focussed on the phases of these 
coefficients on the grounds that a property of a statistically homogenous and 
isotropic GRF is that these phases arc random. Phases can also be use to test 
for the presence of primordial magnetic fields (Chen et al. 2004; Naselsky et 
al. 2004) or evidence of non- trivial topology (Dineen, Rocha & Coles 2004). 

4.1 Spherical Harmonics and Gaussian Fluctuations 

We can describe the distribution of fluctuations in the microwave background 
over the celestial sphere using a sum over a set of spherical harmonics: 



Here A{6, (j)) is the departure of the temperature from the average at angular 

position (9, (f) on the celestial sphere in some coordinate system, usually 
galactic. The Yim{0,(j>) are spherical harmonic functions which we define in 
terms of the Legendre polynomials Pim using 




oo m—+l 



(40) 



l=\ m=—l 
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Y,^{e,ct>) = i-^r f^^^^lif^^^' Pirn{cos0)e'"'^, (41) 

i.e. we use the Condon-Shortley phase convention. In Equation (1), the ai^m 
are complex coefficients which can be written 

ai,m = Xl^rn + ijjl.m = \aLrn\ CXp[i(/!)(,„] . (42) 

Note that, since A is real, the definitions (40) & (41) requires the following 
relations between the real and imaginary parts of the ai^m' if m is odd then 

'^L^m — ^i^l,m) — ^(*^/, — m) — '^l^—mi 

yi,m = '^{ai,m) = '^{ai-m) = Vl-m; (43) 

while if m is even 

*^/,m — ^('^/,m) — ^(^/, — m) — m? 

yi,m = S(a(,m) = -3(a( _m) = yi-m; (44) 
and if m is zero then 

Qiai,m) = yi,o = 0. (45) 

From this it is clear that the m = mode always has zero phase, and there 
arc consequently only I independent phase angles describing the harmonic 
modes at a given I. Without loss of information we can therefore restrict our 
analysis to m > 0. 

If the primordial density fluctuations form a Gaussian random field in 
space the temperature variations induced across the sky form a Gaussian 
random field over the celestial sphere. This means that 

{ai,mal,m') = '^l^ll'^mm', (46) 

where C; is the angular power spectrum, the subject of much scrutiny in 
the context of the cosmic microwave background (e.g. Hinshaw et al. 2003), 
and Sxx' is the Kronecker delta function. Since the phases are random, the 
stochastic properties of a statistically homogeneous and isotropic Gaussian 
random field are fully specified by the Ci, which determines the variance of 
the real and imaginary parts of ai^m both of which are Gaussian: 

cr'^ixi^m) = (T^{yi,m) = (rf = ^Ci. (47) 
4.2 Testing for Phase Correlations 

The approach we take is to assume that we have available a set of phases 
(j)i^rn corresponding to a set of spherical harmonic coefficients ai^m obtained 
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from a data set, either real or simulated. We can also form phase differences 
in according to 

An(0 = 't'Lm+l - 4>l,m- (48) 

If the orthodox cosmological interpretation of temperature fluctuations is cor- 
rect, the phases of the a;,^ should be random and so should phase differences 
of the form (pi^m+i — 0(,m and (j)i+i^rn — 4'i,in- Let us assume, therefore, that 
we have n generic angles, 9i . . .9n- Under the standard statistical assumption 
these should be random, apart from the constraints described in the previous 
section. The first thing we need is a way of testing whether a given set of 
phase angles is consistent with being drawn from uniform distribution on the 
unit circle. This is not quite as simple as it seems, particularly if one does 
not want to assume any particular form for actual distribution of angles, such 
as a bias in a particular direction; see Fisher (1993). Fortunately, however, 
there is a fully non-parametric method available, based on the theory of order 
statistics, and known as as Kuipcr's statistic (Kuiper 1960). 

Kuiper's method revolves around the construction of a statistic, V, ob- 
tained from the data via the following prescription. First the angles are sorted 
into ascending order, to give the set {0i, . . . , 0„}. It docs not matter whether 
the angles are defined to lie in [0, 2n], [— tt, +tt] or whatever. Each angle 6i is 
divided by 2Tr to give a set of variables Xi, where i = 1 . . .n. Prom the set of 
Xi we derive two values 5+ and S~ where 

5+ = max|--Xi,--X2,...,l-X„l (49) 



n n 



and 



Sn = max \Xi,X2- -,...',Xn- \ . 

[ n n J 



(50) 



Kuiper's statistic, V, is then defined as 



V={S+ + S-)- + 0.155 (51) 

Anomalously large values of V indicate a distribution that is more clumped 
than a uniformly random distribution, while low values mean that angles are 
more regular. The test statistic is normalized by the number of variates, n, in 
such a way that standard tables can be constructed to determine significance 
levels for any departure from uniformity; see Fisher (1993). In this context, 
however, it is more convenient to determine significance levels using Monte 
Carlo simulations of the "null" hypothesis of random phases. This is partly 
because of the large number of samples available for test, but also because 
we can use them to make the test more general. 

The first point to mention is that a given set of phases, say belonging 
to the modes at fixed I is not strictly speaking random anyway, because of 
the constraints noted in the previous section. One could deal with this by 
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discarding the conjugate phases, thus reducing the number of data points, 
but thca'c is no need to do this when one can instead build the required 
symmetries into the Monte Carlo generator. 

In addition, suppose the phases of the temperature field over the celestial 
sphere were indeed random, but observations were available only over apart 
of the sky, such as when a galactic cut is applied to remove parts of the map 
contaminated by foregrounds. In this case the mask may introduce phase 
correlations into the observations so the correct null hypothesis would be more 
complicated than simple uniform randomness. As long as any such selection 
effect were known, it could be built into the Monte Carlo simulation. One 
would then need to determine whether V from an observed sky is consistent 
with having been drawn from the set of values of V generated over the Monte 
Carlo ensemble. 

There is also a more fundamental problem in applying this test to spheri- 
cal harmonic phases. This is that a given set of ai^m depends on the choice of 
a particular coordinate axis. A given sky could actually generate an infinite 
number of different sets of (j)i^m because the phase angles arc not rotationally 
invariant. One has to be sure to take different choices of z-axis into consider- 
ation when assessing significance levels, as a random phase distribution has 
no preferred axis while systematic artifacts may. A positive detection of non- 
randomness may result from a chance alignment of features with a particular 
coordinate axis in the real sky unless this is factored into the Monte Carlo 
simulations to. For both the real sky and the Monte Carlo skies we therefore 
need not a single value of V but a distribution of values obtained by rotat- 
ing the sky over all possible angles. A similar approach is taken by Hansen, 
Marinucci & Vittorio (2003). This method may seem somewhat clumsy, but 
a test is to be sensitive to departures from statistical homogeneity one should 
not base the test on measures that are rotationally invariant, such as those 
suggested by Ferreira, Magcuijo & Gorski (1998) as these involve averaging 
over the very fluctuations one is trying to detect. 

4.3 Rotating the aj,rn 

In view of the preceding discussion we need to know how to transform a 

given set of a;.,„ into a new set when the coordinate system is rotated into 
a different orientation. The method is fairly standard, but we outline it here 
to facilitate implementation of our approach. 

Any rotation of the cartesian coordinate system S{x, y, z} S'{x, y, z} 
can be described using a set of three Euler angles a, /3, 7, which define the 
magnitude of successive rotations about the coordinate axes. In terms of 
a rotation operator Z)(q;,/3, 7), defined so that a field f(r,0,(j)) transforms 
according to 

b{a, /?, 7)/(r, e, 0) = /'(r, 0, 0) = /(r, 9', (52) 
a vector r is transformed as 
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r' = D(0, 0, 7)D(0, f3, 0)D(a, 0, 0)r = D(a, p, 7)r. (53) 
Here D is a matrix representing the operator D, i.e. 

(cos 7 sin 7 \ / cos /? — sin /3 \ / cos a sin a \ 
— sin7cos7 I 1 — sinacosaO 1 . 

1/ \sin/JO cos/3 / V ^ 1/ 

(54) 

The Wigner D functions describe the rotation operator used to realise the 
transformations of covariant components of tensors with arbitrary rank I. 
The functions, written as D^^/, transform a tensor from S{x,y,z} to 
S'{x',y',z'}. Consider a tensor Yi^jn{^,(l>) defined under the coordinate sys- 
tem S and apply the rotation operator we get: 

D{a,(i,j)Yi,m{e,cl>) = Yi,m'{9',cf>') = J2Yi,m{e,^)Dl,m'{0,<t>) (55) 

This means that the transformation of the tensor under the rotation of the 
coordinate system can be represented as a matrix multiplication. Finding the 
rotated coefficients therefore requires a simple matrix multiplication once the 
appropriate D function is known. To apply this in practice one needs a fast 
and accurate way of generating the matrix elements Dl^ ^, for the rotation 
matrix. There are {21 + 1)^ elements needed to describe the rotation of each 
mode and the value of each element depends upon the particular values of 
(a, /3, 7)used for that rotation. Details of how to implement this are given in 
Coles et al. (2004). 

In order to apply these ideas to make a test of CMB fluctuations, we 
first need a temperature map from which we can obtain a measured set of 
ai^m- Employing the above transformations with some choice of Euler angles 
yields a rotated set of the ai^m- It is straightforward to choose a set of angles 
such that random orientations of the coordinate axis can be generated. Once 
a rotated set has been obtained, Kuiper's statistic is calculated from the 
relevant transformed set of phases. For example. Coles et al. (2004) generated 
3000 rotated sets of each CMB map using this kind of resampling of the 
original data, producing 3000 values of V^mb- The values of the statistic were 
then binned to form a measured (re-sampled) distribution of Vcmh- The same 
procedure is applied to the 1000 Monte Carlo sets of o;,m drawn from a 
uniformly random distribution, i.e. each set was rotated 3000 times and a 
distribution of Vmc under the null hypothesis is produced. These realizations 
were then binned to created an overall global average distribution under the 
null hypothesis. 

In order to determine whether the distribution of V^mb is compatible with 
a distribution drawn from a sky with random phases, we use a simple test, 
using 

x' = Y: ^^2^ (56) 

Ti 
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where the summation is over all the bins and fi is the number expected in 
the ith bin from the overall average distribution. The larger the value of 

the less likely the distribution functions are to be drawn from the same 
parent distribution. Values of Xmc calculated for the 1000 Monte Carlo 
distributions and xlmh calculated from the distribution of V^mb- If the value 
of xlnih greater than a fraction p of the values of Xmc ^^^^ phases 
depart from a uniform distribution at significance level p. We have chosen 95 
per cent as an appropriate level for the level at which the data are said to 
display signatures that are not characteristic of a statistically homogeneous 
Gaussian random field. 

Application of this relatively straightforward method to the WMAP first- 
year data shows the existence of phase correlations, as demonstrated in Figure 
2. 




Fig. 2. A reconstruction of the WMAP ILC made using the spherical harmonic 
mode amplitudes a;_m for / = 16 only. Our analysis method (Coles et al. 2004) 
shows that these modes at different m have correlated phases in harmonic space, 
and the reconstructed sky shows this is aligned with the Galactic Plane. 



4.4 Random Walks in Harmonic Space 

To begin with, we concentrate on a simple measure based on the distribution 
of total displacements. Consider a particular value of I. The set of values 
{^i.m} can be thought of as steps in a random walk in the complex plane, a 
structure which can be easily visualized and which has well-known statistical 
properties. 

The simplest statistic one can think of to describe the set {ai m} is the 
net displacement of a random walk corresponding to the spherical harmonic 
mode Z, i.e. 
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Ri = ^ ai,„, (57) 

m>0 

where the vector SL\,m = {xi^miyi.m) a-iid the random walk has an origin at 
aifi (which is always on the x-axis). The length of each step a;^^ = |ai ml is 
the usual spherical harmonic coefBcient described in the previous section and 
defined by equation (1). If the initial fluctuations arc Gaussian then the two 
components of each displacement are independently normal with zero mean 
and the same variance (8). Each step then has a Rayleigh distribution so that 
the probability density for ai^m to be in the range (a, a + da) is 

p(a) = ^exp(^-^). (58) 

This is a particularly simple example of a random walk (McCrea & Whipple 
1940; Chandrasekhar 1943; Hughes 1995). Since the displacements in x and 
y are independently Gaussian the next displacement after I steps is itself 
Gaussian with variance laf. The probability density of \Ri\ to be in the 
range (r, r + dr) is then itself a Rayleigh distribution of the form 

This requires an estimate of af . This can either be made using the same data 
or by assuming a given form for Ci, in which case the resulting test would 
be of a composite hypothesis that the fluctuations constitute a Gaussian 
random field with a particular spectrum. For large I this is can be done 
straightforwardly, but for smaller values the sampling distribution of i?j will 
differ significantly from (59) because of the uncertainty in population variance 
from a small sample of • This is the so-called "cosmic variance" problem. 

So far we have concentrated on fixed I with a random walk as a function 
of m. We could instead have fixed m and considered a random walk as a 
function of /. Or indeed randomly selected iV values of I and m. In either 
case the results above still stand except with a'f replaced by an average over 
all the modes considered: 

<^' = ^Y.<m- (60) 

We do not consider this case any further in this paper. 

The result (59) only obtains if the steps of the random walk arc indepen- 
dent and Gaussian. If the distribution of the individual steps is non-Gaussian, 
but the steps are independent, then the result (59) will be true for large I by 
virtue of the Central Limit Theorem. Exact results for finite I for example 
non-Gaussian distributions are given by Hughes (1995). In such cases the 
overall 2D random walk comprises two independent ID random walks in x 
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and y. The Gaussianity of the individual step components can be tested us- 
ing their empirical distributions via a Kolmogorov-Smirnov (K-S) or similar 
approach. Lack of independence of step size or step direction (i.e. phase cor- 
relations) would appear as anisotropy of their joint distribution which could 
be quantified by direct measures of cross-correlation or by testing the bivari- 
ate distribution using an appropriate 2D K-S test. The latter task is harder, 
especially if the number of modes available is small. Using the net displace- 
ment in ID or 2D corresponds to using the sum of a sample of n variables 
to test the parent distribution. This is not necessarily powerful, but is ro- 
bust and has well-defined properties. The true advantage of the random-walk 
representation is that it encapsulates the behaviour of the set {ai^m} in a 
graphical fashion which is ideal for data exploration. 

A slightly diflPerent approach is to keep each step length constant. The 
simplest way of doing this is to define 



so that each step is of unit length but in a random direction. This is precisely 
the problem posed in a famous letter by Pearson (1905) and answered one 
week later by Rayleigh (1905). In the limit of large numbers of steps the 
result maps into the previous result (59) with af = 1 by virtue of the Central 
Limit Theorem. For finite values of / there is also an exact result which can 
be derived in integral form using a method based on characteristic functions 
(Hughes 1995). The result is that the probability density for Ri to be in the 
range r,r + dr is 



The integral is only convergent for / > 2 but for I = 1 or I = 2 straightfor- 
ward alternative expressions are available (Hughes 1995). One can use this 
distribution to test for randomness of the phase angles without regard to the 
amplitudes. 

A simple test of the hypothesis that the fluctuations are drawn from a 
statistically homogeneous and isotropic Gaussian random field on the sky 
could be furnished by comparing the empirical distribution of harmonic ran- 
dom flights with the form (59). As we explained above, however, the net 
displacement of the random walk is a simple but rather crude indication of 
the properties of the {ai^m}, as it does not take into account the ordering of 
the individual steps. The possible non-Gaussian behaviour of the set {ai^m} 
is encoded not so much in the net displacement but in the shape of the ran- 
dom walk. To put this another way, there are many possible paths with the 
same net displacement, and these will have different shapes depending on the 
correlations between step size and direction. Long runs of directed steps or 
regular features in the observed structure could be manifestations of phase 
correlation (Coles et al. 2004). The graphical representation of the set {ai^m} 
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in the form illustrated by Figure 3 provides an elegant way of visualizing the 
behaviour of the harmonic modes and identifying any oddities. These could 
be quantified using a variety of statistical shape measures: moment of inertia 
(Rudnick, Beldjenna & Gaspari 1987), fractal dimension, first-passage statis- 
tics, shape statistics (e.g. Kuhn & Uson 1982), or any of the methods use 
to quantify the shape of minimal spanning trees (Barrow, Bhavsar & Son- 
oda 1985). Specific examples of correlated random walks are given in Hughes 
(1995). 




°70 -60 -50 -40 -30 -20 -10 10 20 30 40 50 60 70 
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Fig. 3. The random walk performed by the spherical harmonic coefficients for 
I = 532 in the WMAP ILC data, statistically the mode that displays the greatest 

departure from that expected under the null hypothesis. The outer circles corre- 
spond to 99.9, 99 and 95 per cent upper confidence limits s (from outer to inner); 
the inner circles are the corresponding lower limits, though the 99.9 per cent lower 
limit is too small to see. 

In practice the most convenient way to assess the significance of depar- 
tures from the relevant distribution would be to perform Monte Carlo ex- 
periments of the null hypothesis. For statistical measures more complicated 
than the net displacement, the best way to set up a statistical test is to use 
Monte-Carlo re-orderings of the individual steps to establish the confidence 
level of any departure from Gaussianity. This also enables one to incorporate 
such complications as galactic cuts. 

The WMAP team released an Internal Linear Combination (ILC) map 
that combined five original frequency band maps in such a way to maintain 
unit response to the CMB whilst minimising foreground contamination. The 
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construction of this map is described in detail in Bennett et al. (2003). The 
weighted map is produced by minimizing the variance of the temperature 
scale such that the weights add to one. To further improve the result, the in- 
ner Galactic plane is divided into 11 separate regions and weights determined 
separately. This takes account of the spatial variations in the foreground prop- 
erties. Thus, the final combined map does not rely on models of foreground 
emission and therefore any systematic or calibration errors of other experi- 
ments do not enter the problem. The final map covers the full-sky and the 
idea is that it should represent only the CMB signal. Following the release 
of the WMAP 1 yr data Tegmark, Oliveira-Costa & Hamilton (2003; TOH) 
produced a cleaned CMB map. They argued that their version contained 
less contamination outside the Galactic plane compared with the ILC map 
produced by the WMAP team. 

The ILC map is not intended for statistical analysis but in any case rep- 
resents a useful "straw man" for testing statistical techniques for robustness. 
To this end, we analyzed the behaviour of the random-walks representing 
spherical harmonic from Z = 1 to ^ = 600 in the WMAP ILC. Similar results 
are obtained for the TOH map so we do not discuss the TOH map here. 
For both variable-length (57) and unit-length (61) versions of the random- 
walk we generated 100000 Monte Carlo skies assuming Gaussian statistics. 
These were used to form a distribution of |Ri| (or |Ri|) over the ensemble 
of randomly-generated skies. A rejection of the null hypothesis (of stationary 
Gaussianity) at the a per cent level occurs when the measured value of the 
test statistic lies outstide the range occupied by a per cent of the random 
skies. 

Application of this simple test to the WMAP data (Stannard & Coles 
2004) does not strongly falsify the null hypothesis, which is not surprising 
given the simplicity of the measure we have used. The number of modes out- 
side the accepted range is close to that which would be expected if the null 
hypothesis were true. Notice that slightly more modes show up in the unit 
length case than in the other, perhaps indicating that the phase correlations 
that are known to exist in this data (Chiang et al. 2004) are masked if am- 
plitude information is also included. The most discrepant mode turns out to 
be / = 532 in both cases. For interest a plot of the random walk for this case 
is included as Figure 3. 

5 Topological Measures of Large-scale Structure 

The application of phase analysis is obviously all performed in harmonic 

space (whether Foiirier-harmonic or spherical harmonic) . But what does the 
presence of phase correlations mean for the morphology of large-scale struc- 
ture? What is the real-space morphology of a fluctuation field with random 
phases? In studying morphology, one is typically interested in the question of 
how the individual filaments, sheets and voids join up and intersect to form 
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the global pattern shown in Figure 1. Is the pattern cellular, having isolated 
voids surrounded by high -density sheets, or is it more like a sponge in which 
under- and over-dense regions interlock? 

Looking at 'slice' surveys gives the strong visual impression that we are 
dealing with bubbles; pencil beams (deep galaxy redshift surveys with a nar- 
row field of view, in which the volume sampled therefore resembles a very 
narrow cone or "pencil" ) reinforce this impression by suggesting that a line- 
of-sight intersects at more or less regular intervals with walls of a cellular 
pattern. One must be careful of such impressions, however, because of ele- 
mentary topology. Any closed curve in two dimensions must have an inside 
and an outside, so that a slice through a sponge like distribution will appear 
to exhibit isolated voids just like a slice through a cellular pattern. It is im- 
portant therefore that we quantify this kind of property using well-defined 
topological descriptors. 

In an influential series of papers, Gott and collaborators have developed a 
method for doing just this (Gott, Melott & Dickinson 1986; Hamilton, Gott 
& Weinberg 1986; Gott et al. 1989; Gott et al. 1990; Melott 1990; Coles et 
al. 1996). Briefly, the method makes use of a topological invariant known 
as the genus, related to the Euler-Poincare characteristic, of the iso-density 
surfaces of the distribution. To extract this from a sample, one must first 
smooth the galaxy distribution with a filter (usually a Gaussian is used; 
see §14.3) to remove the discrete nature of the distribution and produce a 
continuous density field. By defining a threshold level on the continuous field, 
one can construct excursion sets (sets where the field exceeds the threshold 
level) for various density levels. An excursion set will typically consist of a 
number of regions, some of which will be simply connected, e.g. a deformed 
sphere, and others which will be multiply connected, e.g. a deformed torus 
is doubly connected. If the density threshold is labelled by u, the number 
of standard deviations of the density away from the mean, then one can 
construct a graph of the genus of the excursion sets at as a function of v. 
we call this function G(z^). The genus can be formally expressed as an integral 
over the intrinsic curvature K of the excursion set surfaces, S^, by means of 
the Gauss-Bonnet theorem. 

The general form of this theorem applies to any two-dimensional manifold 
A4 with any (one dimensional) boundary dA4 which is piecewisc smooth. 
This latter condition implies that there are a finite number n vertices in the 
boundary at which points it is not differ entiable. The Gauss-Bonnet theorem 
states that 



where the ai are the angle deficits at the vertices (the n interior angles at 
points where the boundary is not differentiable) , kg is the geodesic curvature 
of the boundary in between the vertices and k is the Gaussian curvature of the 
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manifold itself. Clearly ds is an element of length taken along the boundary 
and dA is an area element within the manifold A4. The right-hand side of 
this equation ) is the Euler-Poincare characteristic, Xe of the manifold. 

This probably seems very abstract but the definition above allows us to 
construct useful quantities for both two and three-dimensional c;xamplcs. If 
we have an excursion set as described above in three-dimensions then its 
surface can be taken to define such a manifold. The boundary is just where 
the excursion sets intersect the limits of the survey and it will be taken to be 
smooth. Ignoring this, we see that the Euler-Poincare characteristic is just 
the integral of the Gaussian curvature over the all compact bits of the surface 
of the excursion set. Hence, in this case. 



Roughly speaking, the quantity G is the genus, which for a single surface is 
the number of "handles" the surface posesses; a sphere has no handles and 
has zero genus, a torus has one and therefore has a genus of one. For technical 
reasons to do with the effect of boundaries, it has become conventional not to 
use G but Gs = G — 1. In terms of this definition, multiply connected surfaces 
have Gs > and simply connected surfaces have Gs < 0. One usually divides 
the total genus Gs by the volume of the sample to produce gs, the genus per 
unit volume. 

One of the great advantages of using the genus measure to study large 
scale structure, aside from its robustness to errors in the sample, is that all 
Gaussian density fields have the same form of gs{i^)' 



where A is a spectrum-dependent normalisation constant. This means that, 
if one smooths the field enough to remove the effect of non-linear displace- 
ments of galaxy positions, the genus curve should look Gaussian for any 
model evolved from Gaussian initial conditions, regardless of the form of the 
initial power spectrum which only enters through the normalisation factor 
A. This makes it a potentially powerful test of non-Gaussian initial fluctua- 
tions, or of models which invoke non gravitational physics to form large scale 
structure. The observations support the interpretation that the initial condi- 
tions were Gaussian, although the distribution looks non-Gaussian on smaller 
scales. The nomenclature for the non-Gaussian distortion one sees is a 'meat- 
ball shift': non-linear clustering tends to produce an excess of high-density 
simply-connected regions, compared with the Gaussian curve. The opposite 
tendency, usually called 'swiss cheese', is to have an excess of low density 
simply connected regions in a high density background, which is what one 
might expect to see if cosmic explosions or bubbles formed the large-scale 
structure. What one would expect to see in the standard picture of gravi- 
tational instability from Gaussian initial conditions is a 'meatball' topology 
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when the smoothing scale is small, changing to a sponge as the smoothing 

scale is increased. This is indeed what seems to be seen in the observations 
so there is no evidence of bubbles; an example is shown in Figure 4. 
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Fig. 4. Genus curve for galaxies in the IRAS PSCz survey. The noisy curve is 
the smoothed galaxy distribution while the solid line is the best-fitting curve for a 
Gaussian field; from Canavezes et al. (1998). 

The smoothing required also poses a problem, however, because present 
redshift surveys sample space only rather sparsely and one needs to smooth 
rather heavily to construct a continuous field. A smoothing on scales much 
larger than the scale at which correlations are significant will tend to produce 
a Gaussian distribution by virtue of the central limit theorem. The power of 
this method is therefore limited by the smoothing required, which, in turn, 
depends on the space-density of galaxies. An example is given in the Figure, 
which shows the genus curve for the PSCz survey of IRAS galaxies. 

Topological information can also be obtained from two-dimensional data 
sets, whether these arc simply projected galaxy positions on the sky (such 
as the Lick map, or the APM survey) or 'slices' (such as the various CfA 
compilations). Here the excursion sets one deals with are just regions of the 
plane where the (surface) density exceeds some threshold. This method can 
also be applied to CMB temperature fluctuations where one looks at the 
topology of regions bounded by lines of constant temperature (Coles 1988; 
Gott et al. 1990; CoUey & Gott 2003; Komatsu et al. 2003). 

In such case we imagine the manifold referred to in the statement of the 
Gauss-Bonnet theorem to be not the surface of the excursion set but the 
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surface upon which the set is defined (i.e. the sky). For reasonably small 
angles this can be taken to be a flat plane so that the Gaussian curvature of 
M is everywhere zero. (The generalization to large angles is trivial; it just 
adds a constant curvature term.) The Euler characteristic is then simply an 
integral of the line curvature of around the boundaries of the excursion set: 



In this case the Euler Poincarc characteristic is simply the number of isolated 
regions in the excursion set minus the number of holes in such regions. 

This is analogous to the genus, but has the interesting property that it 
is an odd function of i> for a two dimensional Gaussian random field, unlike 
G(i/) which is even. In fact the mean value of x per unit area on the sky takes 
the form 



where B is a constant which depends only on the (two-dimensional) power 
spectrum of the random field. Notice that x < for < and x > for 
u > 0. A curve shifted to the left with respect to this would be a meatball 
topology, and to the right would be a swiss-cheese. 

There are some subtleties with this. Firstly, as discussed above, two- 
dimensional topology does not really distinguish between 'sponge' and 'swiss- 
cheesc' alternatives. Indeed, there is no two-dimensional eqiiivalent of a 
sponge topology: a slice through a sponge is topologically equivalent to a slice 
through swiss-cheese. Nevertheless, it is possible to assess whether, for exam- 
ple, the mean density level (z/ = 0) is dominated by underdense or overdense 
regions so that one can distinguish swiss-cheese and meatball alternatives to 
some extent. The most obviously useful application of this method is to look 
at projected catalogues, the main problem being that, if the catalogue is very 
deep, each line of sight contains a superposition of many three-dimensional 
structures. This projection acts to suppress departures from Gaussian statis- 
tics by virtue of the central limit theorem. Nevertheless, useful information 
is obtainable from projected data simply because of the size of the data sets 
available; as is the case with three-dimensional studies, the analysis reveals a 
clear meatball shift which is what one expects in the gravitational instability 
picture. The methods used for the study of two-dimensional galaxy cluster- 
ing can also be used to analyze the pattern of fluctuations on the sky seen in 
the cosmic microwave background. 

More recently, this approach has been generalized to include not just 
the Euler-Poincare distribution but all possible topological invariants. This 
means all quantities that satisfy the requirement that they be additive, con- 
tinuous, translation invariant and rotation invariant. For an excursion set 
defined in d dimensions there are d+1 such quantities that can be regarded 
as independent. Any characteristic satisfying these invariance properties can 
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be expressed in terms of linear combinations of these four independent quan- 
tities. These arc usually called Minkowski functionals. Their use in the anal- 
ysis of galaxy clustering studies was advocated by Mecke, Buchert & Wagner 
(1994) and has become widespread since then. 

In three dimensions there arc four Minkowski functionals. One of these 
is the integrated Gaussian curvature (equivalent to the genus we discussed 
above). Another is the mean curvature, H defined by 



In this expression i?i and R2 are the principal radii of curvature; at any point 
in the surface; the Gaussian curvature is 1/ {R1R2) in terms of these variables. 
The other two Minkowski functionals are more straightforward. They are the 
surface area of the set and its volume. These four quantities give a "complete" 
topological description of the excursion sets. 

6 Discussion 

In this paper I have tried to explain how phase correlations, arising from 
primordial non-Gaussianity, non-linear evolution (or indeed systematic error) 
can be measured and use to test cosmological models. The use of direct 
phase information is relatively new in cosmology, so I concentrated on basic 
properties and explained in some detail how phases relate to more familiar 
descriptors such as the bispectrum and three-point covariance functions. The 
magnitude of these statistical descriptors is of course related to the amplitude 
of the Fourier modes, but the factor that determines whether they are zero 
or non-zero is the arrangement of the phases of these modes. 

The connection between polyspcctra and phase information is an impor- 
tant one and it opens up many lines of future research, such as how phase 
correlations relate to redshift distortion and bias. Using small volumes of 
course leads to sampling uncertainties which are quite straightforward to deal 
with in the case of the power-spectra but more problematic for higher-order 
spectra like the bispectrum. Understanding the fluctuations about ensemble 
averages in terms of phases could also lead to important insights. On the 
other hand, the application of phase methods to galaxy clustering studies is 
complicated by the non-linear evolution of perturbations as they collapse and 
form bound structures. Structures which arc highly localized in real space are 
highly dispersed in Fourier space, so it is quite diSicult to disentangle any 
primordial phase correlations from artifacts of non-linear evolution. 

The CMB is a much more promising arena for the application of these 
methods. Late-time non-linear effects should be small (at least on large angu- 
lar scales) so any phase correlations will almost certainly arise from either pri- 
mordial effects or residual foreground contamination. The preliminary anal- 
ysis we have performed using the WMAP data shows that there are indeed 
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phase correlations, but Figure 2 suggests the likely interpretation of this is 
that it relates to the galaxy. As the constraints on early Universe physics get 
stronger, the importance of identifying low-amplitude foregrounds becomes 
all the more important. The next era of CMB physics is likely to be dom- 
inated by polarization studies where the effects of foregrounds are likely to 
be even more complicated. There remains a great deal to learn about how to 
fully characterize the polarization maps that will soon be obtained. We can 
be certain, however, that phase information (in one way or another) will help 
us understand what is going on. 



References 

1. Adlcr R.J., 1981, The Geometry of Random Fields. John Wiley & Sons, New 
York. 

2. Albrecht A., Steinhardt P. J., 1982, Phys. Rev. Lett., 48, 1220 

3. Banday A.J., Dickinson C, Davies R.D., Davies R.J., Gorski K.M. 2003, MN- 

RAS, 345, 897 

4. Bardeen J.M., Bond J.R., Kaiser N, Szalay A.S., 1986, ApJ, 304, 15 

5. Barrow J.D., Bhavsar S.P., Sonoda D.H., 1985, MNRAS, 216, 17 

6. Bartolo N., Matarrese S., Riotto A., 2002, Phys. Rev. D., 65, 103505 

7. Bennett C, et al. 2003, ApJS, 148, 1 

8. Bernardeau F., 1992, ApJ, 392, 1 

9. Bernardeau F., Colombi S., Gaztanaga E., Scoccimarro R., 2002, Phys. Rep., 
367, 1 

10. Canavezes A., et al. 1998, MNRAS, 297, 777 

11. Chandrasckhar S., 1943, Rev. Mod. Phys., 15, 1 

12. Chen C, Muklierjee P., KahniashviU T., Ratra B., Wang Y., 2004, ApJ, 611, 
655 

13. Chiang L.-Y., Naselsky P. D., Verkhodanov O. V., Way M. J., 2003, ApJ, 590, 
L65 

14. Chiang L.-Y., 2001, MNRAS, 325, 405 

15. Chiang L.-Y., Coles P., 2000, MNRAS, 311, 809 

16. Chiang L.-Y., Coles P., Naselsky P., 2002, MNRAS, 337, 488 

17. Chiang L.-Y., Naselsky P.D., Coles P., 2004, ApJ, 602, LI 

18. Coles P., 1988, MNRAS, 238, 509 

19. Coles P., 1993, MNRAS, 262, 1065 

20. Coles P., Barrow J.D., 1987, MNRAS, 228, 407 

21. Coles P., Chiang L.-Y., 2000, Nature, 406, 376 

22. Coles P., Davies A.G., Pearson R.C., 1996, MNRAS, 281, 1375 

23. Coles P., Dineen P., Earl J., Wright D., 2004, MNRAS, 350, 989 

24. Coles P., Jones B.J.T., 1991, MNRAS, 248, 1 

25. Coles P., Lucchin F., 2002, Cosmology: The Origin and Evolution of Cosmic 
Structure, 2nd Edition. John Wiley & Sons, Chichester 

26. CoUey W.N., Gott J.R., 2003, MNRAS, 344, 686 

27. de Oliveira-Costa A., Tegmark M., Zaldarriaga M., Hamilton A.J.S., 2004, 
Phys. Rev. D., 69, 063516 



Phase Correlations and Topological Measures of Large-scale Structure 



29 



28. de Oliveira-Costa A., Tegmark M., Davies R.D., Gutierrez CM., Lasenby A.N., 

Rebolo R., Watson R.A., 2004, ApJ, 606, L89 

29. Dinoen R, Coles P., 2004, MNRAS, 348, 52 

30. Dinoen P., Rocha C, Coles, P., 2004, MNRAS, in press, astro-ph/0404356 

31. Efstathiou G., 2004, MNRAS, 348, 885 

32. Eriksen H. K., Hansen F. K., Banday A. J., Gorski K. M., Lilje P. B., 2004, 

ApJ, 605, 14 

33. Falk T., Rangarajan R., Srednicki M., 1993, ApJ, 403, LI 

34. Ferreira P.G., Magueijo J., Gorski K.M., 1998, ApJ, 503, LI 

35. Fisher N.L, 1993, Statistical Analysis of Circular Data. Cambridge University 
Press, Cambridge. 

36. Fry J.N., 1984, ApJ, 279, 499 

37. Gangui A., Lucchin F., Matarrese S., MoUerach S., 1994, ApJ, 430, 447 

38. Gott J.R., Melott A.L., Dickinson M.L., 1986, ApJ, 341 

39. Gott J.R. et al. 1989, ApJ, 340, 625 

40. Gott J.R., Park C, Juszkiewicz R., Bies W.E., Bennet D.P., Stebbins A., 1990, 
ApJ, 352, 1 

41. Guth A. H., 1981, Phys. Rev. D., 23, 347 

42. Guth A. H., Pi S.-Y., 1982, Phys. Rev. Lett. 49, 1110 

43. Hajian A., Souradeep T., 2003, ApJ, 597, L5 

44. Hajian, Souradeep T., Cornish N., 2004, preprint, astro-ph/0406354 

45. Hamilton A.J.S., Gott J.R., Weinberg D.H., 1986, ApJ, 309, 1 

46. Hansen F.K., Marinucci D., Vittorio N., 2003, Phys. Rev. D., 67, 123004 

47. Hinshaw G. et al. 2003, ApJ, 148, 

48. Hobson M.P., Jones A.W., Lasenby A., 1999, MNRAS, 309, 125 

49. Hughes B.D., 2004, Random Walks and Random Environments. Volume 1: 
Random Walks. Oxford University Press, Oxford. 

50. Jain B., Bertschinger E., 1998, ApJ, 509, 517 

51. Kendall M., Stuart A., 1977, The Advanced Theory of Statistics, Vol. 1. Griffin 
& Co, London 

52. Komatsu E. et al. 2003, ApJ, 148, 119 

53. Komatsu E., Spergel D.N., 2001, Phys. Rev. D., 63, 063002 

54. Koyama K., Soda J., Taruya A., 1999, MNRAS, 310, 1111 

55. Kuhn J.R., Uson J.M., 1982, ApJ, 263, L47 

56. Kuiper N.H., 1960, Koninklijke Nederlandse Akademie Van Wetenschappen, 
Proc. Ser. A, LXIII, pp. 38-49 

57. Larson D.L., Wandelt B.D., 2004, ApJ, 613, L85 

58. Linde, A.D., 1982, Phys. Lett. B., 108, 389 

59. Lohmann A.W., Weigelt G., Wirnitzer B., 1983, Appl. Optics, 22, 4028 

60. Luo X., 1994, ApJ, 427, L71 

61. Luo X., Schramm D.N., 1993, ApJ, 408, 33 

62. Matarrese S., Verde L., Jimenez R., 2000, ApJ, 541, 10 

63. McCrea W.H., Whipple F.J.W., 1940, Proc. R. Soc. Edin., 60, 281 

64. Mecke K.R., Buchert T., Wagner H., 1994, A& A, 288, 697 

65. Melott A.L., 1990, Phys. Rep., 193, 1. 

66. Moscardini L., Matarrese S., Lucchin F., Messina A., 1991, MNRAS, 248, 424 

67. Naselsky P.D., Chiang L.-Y., Olesen P., Verkhodanov O., 2004, astro- 
ph/0405181 

68. Naselsky P.D., Doroshkevich A.G., Verkhodanov O., 2003, ApJ, 599, L53 



30 Peter Coles 



69. Park, C.-G., 2004, MNRAS, 349, 313 

70. Peacock J.A., Dodds S., 1996, MNRAS, 267, 1020 

71. Pearson K., 1905, Nature, 72, 294 

72. Peebles P.J.E., 1980, The Large-Scale Structure of the Universe. Princeton 
University Press, Princeton N,J. 

73. Peebles P.J.E., 1999a, ApJ, 510, 523 

74. Peebles P.J.E., 1999b, ApJ, 510, 531 

75. Percival W.J. et al. 2001, MNRAS, 327, 1297 

76. Rayleigh, 1905, Nature, 72, 318 

77. Rudnick J., Beldjenna A., Gaspari G., 1987, J. Math. Phys. A, 20, 971 

78. Ryden B.S., Gramann M., 1991, ApJ, 383, L33 

79. Salopck D.S., 1992, Phys. Rev. D., 45, 1139 

80. Salopek D.S., Bond J.R., Phys. Rev. D., 42, 3936 

81. Scherrer R.J., Melott A.L., Shandarin S.F., 1991, ApJ, 377, 29 

82. Schmalzing J., Gorski K.M., 1998, MNRAS, 297, 355 

83. Scoccimarro R., Colombi S., Fry J.N., Fricman J. A., Hivon E., Melott A.L., 
1998, ApJ, 496, 586 

84. Scoccimarro R., Couchman H.M.P., Frieman J.A., 1999, ApJ, 517, 531 

85. Shandarin S.F., 2002, MNRAS, 331, 865 

86. Smoot G.F. ot al. 1992, ApJ, 396, LI 

87. Soda J., Suto Y., 1992, ApJ, 396, 379 

88. Stannard A.J., Coles P., 2004, MNRAS, submitted, astro-ph/0410633 

89. Starobinsky A. A., 1979, Pis'ma Zh. Eksp. Teor. Fiz., 30, 719 

90. Starobinsky A. A., 1980 Phys. Lett. B., 91, 99 

91. Starobinsky A. A., 1982, Phys. Lett. B., 117, 175 

92. Stirling A.J., Peacock J.A., 1996, MNRAS, 283, L99 

93. Tegmark M., de Oliveirar-Costa A., Hamilton A.J.S., 2003, Phys. Rev. D., 68, 
123523 

94. Varshalovich D.A., Moskalev A.N., Khersonskii V.K., 1988, Quantum Theory 
of Angular Momentum. World Scientific, Singapore. 

95. Vielva P., Martinez-Gonzalez E., Barreiro R.B., Sanz J.L., Cayon L., 2004, ApJ, 
609, 22 

96. Verde L., Heavens A.F., 2001, ApJ, 553, 14 

97. Verde L., Jimenez R., Kamionkowski M., Matarrese S., 2001, MNRAS, 325, 
412 

98. Verde L., Wang L., Heavens A.F., Kamionkowski M., 2000, MNRAS, 313, 141 

99. Verde L. et al. 2002, MNRAS, 335, 432 

100. Wang L., Kamionkowski M., 2000, Phys. Rev. D., 61, 063504 

101. Watts P.I.R., Coles P., 2002, MNRAS, 338, 806 

102. Watts P.LR., Coles P., Melott A.L., 2003, ApJ, 589, L61 



